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ABSTRACT 

We report on an informal survey about the use of software in the worldwide astronomical community. 
The survey was carried out between December 2014 and February 2015, collecting responses from 1142 
astronomers, spanning all career levels. We find that all participants use software in their research. The 
vast majority of participants, 90%, write at least some of their own software. Even though writing 
software is so wide-spread among the survey participants, only 8% of them report that they have 
received substantial training in software development. Another 49% of the participants have received 
“little” training. The remaining 43% have received no training. We also find that astronomers’ 
software stack is fairly narrow. The 10 most popular tools among astronomers are (from most to least 
popular): Python, shell scripting, IDL, C/C++, Fortran, IRAF, spreadsheets, HTML/CSS, SQL and 
Supermongo. Across all participants the most common programing language is Python (67 ± 2%), 
followed by IDL (44 ± 2%), C/C++ (37 ± 2%) and Fortran (28 ± 2%). IRAF is used frequently by 
24 ± 1% of participants. We show that all trends are largely independent of career stage, area of 
research and geographic location. 


1. INTRODUCTION 

Much of modern Astronomy research depends on soft¬ 
ware. Digital images and numerical simulations are cen¬ 
tral to the work of most astronomers today, and any¬ 
one who is actively involved in astronomy research has 
a variety of software techniques in their toolbox. Fur¬ 
thermore, the sheer volume of data has increased dra¬ 
matically in recent years. The efficient and effective use 
of large data sets increasingly requires more than rudi¬ 
mentary software skills. Finally, as astronomy moves to¬ 
wards the open code model, propelled by pressure from 
funding agencies and journals as well as the community 
itself, readability and reusability of code will become in¬ 
creasingly important (Figure Q. Yet we know few details 
about the software practices or astronomers. In this work 
we aim to gain a greater understanding of the prevalence 
of software tools, the demographics of their users, and 
the level of software training in astronomy. 

The astronomical community has, in the past, provided 
funding and support for software tools intended for the 
wider community. Examples of this include the God¬ 
dard IDL library (funded by the NASA ADP), IRAF 
(supported and developed by AURA at NOAO), STS- 
DAS (supported and developed by STScI), and the Star- 
link suite (funded by PPARC). As the field develops, new 
tools are required and we need to focus our efforts on ones 
that will have the widest user base and the lowest bar¬ 
rier to utilization. For example, as our work here shows, 
the much larger astronomy user base of Python relative 
to the language R suggests that tools in the former lan¬ 
guage are likely to get many more users and contributers 
than the latter. 

More recently, there has been a growing discussion of 
the importance of data analysis and software develop¬ 
ment training in astronomy (e.g., the special sessions at 
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the 225th AAS “Astroinformatics and Astrostatistics in 
Astronomical Research Steps Towards Better Curricula” 
and “Licensing Astrophysics Codes”, which were stand¬ 
ing room only). Although astronomy and astrophysics 
went digital long ago, the formal training of astronomy 
and physics students rarely involves software develop¬ 
ment or data-intensive analysis techniques. Such skills 
are increasin gly critical in the er a of ub iquitous “Big 
Data” (e.g., ]&rriman & Groom| (|2011|), or the 2015 
NOAO Big Data conference). Better information on the 
needs of researchers as well as the current availability 
of training opportunities (or lack thereof) can be used 
to inform, motivate and focus future efforts towards im¬ 
proving this aspect of the astronomy curriculum. 

In 2014 the Software Sustainability Institute carried 
out an inquiry into the so ftware use of researchers in the 
UK dHettrick et al. (2014), see also the associated presen¬ 
tation ^"T^Ins'^urwy provides useful context for software 
usage by researchers, as well as a useful definition of “re¬ 
search software”: 


Software that is used to generate, process or 
analyze results that you intend to appear in 
a publication (either in a journal, conference 
paper, monograph, book or thesis). Research 
software can be anything from a few lines of 
code written by yourself, to a professionally 
developed software package. Software that 
does not generate, process or analyze results 
- such as word processing software, or the use 
of a web search - does not count as research 
software for the purposes of this survey. 

However, this survey was limited to researchers at UK 
institutions. More importantly, it was not focused on as¬ 
tronomers, who may have quite different software prac¬ 
tices from scientists in other fields. 

Motivated by these issues and related discussions dur¬ 
ing the .Astronomy 6 conference, we created a survey 
to explore software use in astronomy. In this paper, we 
discuss the methodology of the survey in il the results 
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from the multiple-choice sections in and the free-form 
comments in In we compare our results to the 
aforementioned^SI survey and in ^we conclude. 

We have made the anonymized results of the survey 
and the code to generate the summary figures avail¬ 
able at https://github.com/eteq/software_survey_ 
analysis, This repository may be updated in the future 
if a signihcant number of new respondents fill out the 


surve' 


2. DATA AND METHODS 

The survey was constructed as a Google form question¬ 
naire with seven questions and one comment box. Four 
of the questions were about software use and inspired by 
the SSI survey: 

1. Do you use software in your research? 

2. Have you had formal training in software develop¬ 
ment? 

3. Which of these is more common in your work? (on 
writing one’s own software) 

4. Select any of these that you use regularly to write 
code for your research, (on most commonly used 
software tools) 

The remaining three questions requested basic demo¬ 
graphic information: 

1. What is your field of research? 

2. What is your career stage? 

3. What is the location of your institution? 

The survey was opened on December 9, 2014. The at¬ 
tendees of the .Astronomy 6 conference were asked to for¬ 
ward a link to the survey to their home departments, in¬ 
cluding a prompt to send it on to any other interested as¬ 
tronomer groups. A link to the survey was also posted on 
the Astronomers Facebook group. The survey received 
758 responses on the first day and another 210 during 
the following day. The data for this work was collected 
on February 3, 2015. The number of participants at that 
time was 1145. Three responses from participants who 
indicated that they work in fields other than astronomy 
were removed for a final tally of 1142 participants. 

2.1. Survey Demographics 

The demographics of the sample at the time of col¬ 
lection were the following. Of the 1142 participants, 380 
are graduate students, 340 are postdocs, 385 are research 
scientists and faculty (175 and 200, respectively). The re¬ 
maining 37 were undergraduate students (10), emeritus 
professors, support scientists at observatories, adjunct 
faculty, post-bachelor’s researchers, etc. These 37 “mis¬ 
cellaneous” career levels will be included in the analysis 
of the full sample but will not be included in any of the 
other career subgroups. For the analysis, we combine the 
research scientist and faculty subgroups to create three 
groups of similar size, roughly corresponding to “early”, 
“intermediate” and “late” career stages. 

^ http://tinyurl.com/pvyqw59 


In terms of areas of research, 823 participants chose 
“Observational Astronomy/Astrophysics”, 353 selected 
“Theoretical Astronomy/Astrophysics”, 130 indicated 
that they work in astronomical instrumentation and 66 
in planetary science. 22 participants did not choose any 
of these four main categories. Of these, nine partici¬ 
pants selected “Other”, three did not choose an area of 
research, and 10 entered custom values such as physics, 
astro-statistics, cosmology, astroparticle physics, space 
physics, etc. Participants were allowed to choose more 
than one area of research, which is why the numbers for 
the individual categories add to more than 1142. 

The final piece of demographic data we collected was 
the geographic location of the participants’ home insti¬ 
tution. The majority of participants are from the USA 
(546), followed by Germany (170), UK (90), Australia 
(69) and Ghile (35). 80% of the participants come from 
these five top countries, with 48% from the USA. The 
remaining 232 participants come from 41 different coun¬ 
tries. The break down is the following: 

Netherlands (31), Sweden (21), Argentina (19), 
Ganada (18), Brazil (14), France (13), Spain (12), Italy 
(11), Poland (9), Mexico (9), Switzerland (9), Israel (6), 
Denmark (5), Finland (3), India (3), Ireland (3), Por¬ 
tugal (3), Japan (3), United Arab Emirates (2), South 
Korea (2), South Africa (2), Russia (2), South Korea (2), 
Belgium (2), Austria (2), New Zealand (2), Greece (1), 
Lithuania (1), Georgia (1), Malaysia (1), Norway (1), 
Slovakia (1), Czech Republic (1), China (1), Swaziland 
(1), Taiwan (1), Turkey (1), Uzbekistan (1), Hungary 
(1), Vatican (1), Ghana (1). 

The geographic distribution of the participants indi¬ 
cates that our methods of circulating the survey were un¬ 
able to reach a broad base of researchers in Asia, Africa, 
and Eastern Europe. Compared to the lAU membership, 
the USA is over-represented by a factor of 2.1, Germany 
by a factor of 2.7, and China is under-represented by a 
factor of 60. Major contributors to this imbalance are the 
language of the survey (English) and the method of dis¬ 
tribution (social media and friends-of-friends networks). 
Hence, any conclusions we make will be only applica¬ 
ble to the researchers working in the countries which are 
predominantly represented. 

2.2. Survey Completeness 

This survey should not be viewed as a systematically 
representative sample of the astronomy community. The 
request to fill out the survey was primarily spread via so¬ 
cial networks (i.e., departmental e-mails, Eacebook, twit¬ 
ter, etc.), so it is possible that the sample of astronomers 
surveyed is biased in ways that may affect the results 
presented below. Given that we are astronomers (not so¬ 
cial scientists), we are not trained in methods to address 
these effects, and therefore simply present the raw results 
of the survey with no correction for selection bias. That 
said, the sheer number of responses implies these results 
represent a significant part of the community. Regard¬ 
less, we would certainly be happy if this work inspires 
a more rigorous survey by social scientists with domain- 
specific expertise. 

3. RESULTS 

In this section we show and describe the results from 
the survey. We first focus on software use and whether 
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Fig. 1. — An xkcd.com comic strip that captures one of the problems with the lack of software training. This strip was widely shared 
among astronomers on social networks, showing that the problem is known in at least part of the community. 


Do you u&e software in your research? 


Yes 

100 % 

N=1142 


Fig. 2.— Responses to the question “Do you use software in 
your research?”. 100% of survey participants answered in the af¬ 
firmative. 

astronomers write their own software. We then examine 
the training we receive in software development. Finally, 
we discuss the most commonly used software tools. In 
all cases we consider how career stage, research area and 
geographic location alter these results. Where relevant, 
we assume Poisson statistics for error bars and to provide 
estimates of significance. 

3.1. Software Use 

The first question of the survey aims to establish a 
baseline of software use. The answers to the question “Do 
you use software in your research?” are shown in Figure 
1^ Unanimously, all participants responded with “Yes”. 
This unanimity is not surprising: It would be difficult 
to imagine pursuing astronomical research today which 
does not rely on software at least to some extent. But 
the unanimity does serve to underscore the importance 
of software in the field. 

3.2. Do Astronomers Use Their Own Software? 

We ask the survey participants to best describe the au¬ 
thorship of the software that they use. The goal of this 
question is to find what is the predominant practice in 
the community: do most of us use “black box” software 
packages written by few or do most of us write custom 
software? We find that most often we do both (Figure 


first panel): 57 ± 2% choose this option. One third 
m survey participants say they mostly write their own 
software (33 ± 2%), while only a small portion of survey 
participants predominantly use software written by oth¬ 
ers: 11 ± 1%. Overall, 89% of all participants write some 
of their own software. 

In Figure]^ we also explore the breakdown of the an¬ 
swers as a mnction of career stage. The answers vary 
slightly between the three groups. One curious result is 
that the answers of the “early” and “late” career stages 
closely resemble each other. The reason for such a trend 
may be that students follow their advisors’ recommen¬ 
dations on software practices. In both of these groups, 
^ 30 ±3% predominantly write their own software, while 
^ 11 ± 2%) mostly use software written by others. In 
contrast, a larger portion of postdocs write their own 
software: 39 ± 3%. 

We further consider the breakdown of answers as a 
function of research area in Figure The groups are 
not fully independent because participants were allowed 
to choose more than one research area. In all groups the 
largest portion of astronomers, > 50%, use both soft¬ 
ware written by others and write their own. Researchers 
working in theory and instrumentation are more likely 
to primarily depend on their own software (42 ± 4% and 
38 ± 5%, respectively) than planetary and observational 
astronomers (32 ± 7% and 29 ± 2%, respectively). The 
latter two groups are more likely to primarily use soft¬ 
ware written by others, 17±5% and 12±1% for planetary 
and observational researchers, respectively, versus 6±1% 
and 5 ± 2% for theory and instrumentation. 

Finally, we break down the answers by country of the 
researchers’ home institution as shown in FigureThese 
and the following plots by country are more difficult to in¬ 
terpret because they only contain information about the 
researchers’ current institutions rather than their insti¬ 
tutional history. All countries show similar trends, with 
> 50% of astronomers choosing the “Both” option. At 
the extremes, the survey respondents from Germany are 
most likely to write their own software, 38 ± 5%, and the 
respondents from the UK are least likely to use software 
written by others, 6 ± 3%. 

In conclusion, the majority of astronomers, ^ 90% 
write at least some of their own software, across all de¬ 
mographics explored in our survey and a third of survey 
participants predominantly rely on their own software. 
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All Participants 



Graduate Students 



Postdocs 



My Own 


Faculty & Scientists 



Fig. 3.— Answers to “Which of these is more common in your work: I write mostly my own software, I mostly use software written by 
others, or somewhere in between”, sub-divided by career stage. 
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Both 
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Fig. 4. — Same question as Figurebut divided by sub-field. 


3.3. Are We Trained? 

Considering that, across all demographics, ^ 90 % of 
astronomers are involved in writing software ( p.2[ ), it is 
important to asses the level of training we receive. We 
allowed participants to choose from one of three levels 
of training in software development: “A little”, “A lot”, 
or “None”. The survey questions did not give guidelines 
on how to interpret the first two categories. Rather, we 
left it to the survey participants to decide whether they 
thought their training was substantial or not. This sec¬ 
tion breaks down the answers into different demograph¬ 
ics. 

The first panel in Figure shows the answers from 
all participants. Overall, 8 ± 1% of survey participants 
have received substantial training, 49 ± 2% have received 
a little training and 43 ± 2% have received no training. 
Altogether, 57 ± 2% of survey participants have received 
some training in software development. Across all career 
levels, only ^ 8% of astronomers have received significant 
training. Facutly and scientists are slightly more likely 
to have received no training at 50 ±4% versus 40 ±3% for 
the more junior groups. Postdocs and graduate students 


are slightly more likely to have received some training at 
53 ± 4%, relative to faculty and scientists (42 ± 3%). 

In Figure we specifically focus on the training of sur¬ 
vey participants who, in the previous questions, said that 
they primarily write their own software (“My own” op¬ 
tion, ^ 33% of the sample). Overall, 40 ± 3% of those 
participants have received no training and 89 ± 5% have 
received at best a little bit of training. The results for 
this subset are consistent with the answers from the full 
sample within the error bars, i.e. astronomers who pri¬ 
marily write their own software do not have more train¬ 
ing in software development than everyone else. The re¬ 
sults are similar if we also considered the participants 
who write some of their software (“Both” option). This 
finding is key because it shows that the lack of training 
is not because there is no need for such skills. Rather 
training simply does not occur. This is of particular im¬ 
portance because it implies that many astronomers have 
little to no training in an activity that is a major part 
of their research work, despite the fact that they nearly 
always have many years of post-secondary education dur¬ 
ing which they could have received such training. 
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Fig. 5.— Same question as Figure]^ but sub-divided by country (for the countries with the highest number of respondents). 
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Fig. 6. — Answers to the question “Have you had formal training in software development: Yes, a lot; Yes, a little; No”, sub-divided by 
career stage. 


All Participants Graduate Students Postdocs Faculty & Scientists 



Fig. 7.— Answers to the question from Figure but only for those who primarily write their own software. 


Back to the full sample, in Figurej^we show the break¬ 
down of answers as a function of research area. The 
trends remain the same across all fields. The breakdown 
by country (Figure |9) shows that the results are similar 
internationally. Th^raction of astronomers with signif¬ 
icant training is largely independent of geography. Some 
geographical variations exist in the fraction of partici¬ 
pants who have at least a little training: the USA has the 
largest fraction with training: 55 ± 3%, while Australia 


has the smallest with 35 ± 7%. Based on these results, 
we speculate that opportunities to receive at least a little 
bit of training are more available at US institutions or 
that more US researches seek out such opportunities. 

In conclusion, across all career levels, research areas 
and countries, only a small fraction of astronomy re¬ 
searchers receive significant training in software develop¬ 
ment. The lack of a strong trend with career level may 
indicate that significant training only occurs at the un- 
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Fig. 8. — Same question as Figurebut sub-divided by sub-field. 




Fig. 9. — Same question as Figurebut sub-divided by country. 
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dergraduate level (and some participants left comments 
to that effect). While graduate students are more likely 
to have had a little training, it seems that few gradu¬ 
ate programs offer and/or require CS courses (otherwise 
junior astronomers will have a higher level of significant 
training). Overall, ^ 90% of the survey participants have 
received only a little bit of training at best, despite all 
being software users, and most being writers of their own 
software. 

3.4. What is in the Astronomer Software Tool Staek? 

In this section we consider the most common software 
tools for professional astronomers. We refer to the full 
set of software tools an astronomer uses as their “stack”. 
In the survey form we suggested 19 software tools and 
allowed participants to add any options we missed. The 
input was edited to standardize spelling and capitaliza¬ 
tion of tools. In total, participants added 64 custom 
options. 10 respondents did not provide an answer to 
this question. While “C” was an option, “C++” was not 
part of our suggestions. Some participants noted in the 
comments what they chose “C” even though they actu¬ 
ally use “C++”. For this reason we consider C and C++ 
together in our analysis. Within the top-20 most used 
software tools there are four items that were not on our 
original list: C++, Mathematica, gnuplot and awk. 

The overall astronomer stack is rather narrow (Figure 
pT| first panel). Only ten of the software tools are used 
oy more than 10% of the survey participants. These 
are (from most popular to least popular): Python, shell 
scripting, IDL, C/C++, Fortran, IRAF, spreadsheets, 
HTML/CSS, SQL and Supermongo. Across all partici¬ 
pants the most common programing language is Python 
(67 + 2%), followed by IDL (44 + 2%), C/C++ (37 + 2%) 
and Fortran (28 + 2%). Shell scripting is the second most 
popular tool for astronomers (47 ± 2%). The IRAF (Im¬ 
age Reduction and Analysis Facility) environment is used 
by 24 ± 1% of the survey participants. 

Across the different career stages, we notice that se¬ 
nior astronomers have a broader tool stack, i.e. they 
utilize a wider variety of tools in their research. Only 
eight tools are used by more than 10% of graduate stu¬ 
dents, nine tools are used by more than 10% of post¬ 
docs and 11 tools are used by more than 10% of faculty 
and scientists. Python is the most popular tool at all 
career levels, and it is most popular among junior re¬ 
searchers. Four out of five graduate students use Python 
(80 ± 5%), as do 70 ± 5% of postdocs and half of faculty 
and scientists (53 ± 4%). IDL, IRAF and compiled lan¬ 
guages have a more uniform user base across all career 
levels. Some tools are unique to certain demographics. 
Graduate students have the highest fraction of Matlab 
users (11%), while faculty and research scientists dom¬ 
inate HTML/CSS (21%), Supermongo (16%) and Perl 
(16%). 

Unsurprisingly, software tools depend strongly on the 
research area (Figure [IT]). Without attempting to be ex¬ 
haustive, we note some interesting differences between 
fields. Observational astronomers have the highest frac¬ 
tions of IDL (48 + 2%) and IRAF (31 + 2%) users. Theo¬ 
retical researchers have the highest fractions of compiled 
language users: C/C++ with 56 ± 4% and Fortran with 
50 ± 4%. Researchers in instrumentation have a high 
fraction of C/C++ (52 + 6%) and spreadsheet (28 + 5%) 


users. Other tools, however show little field-to-field vari¬ 
ation. Python use is consistently high across all fields at 
60 - 70%, as is shell scripting at ^ 50%. 

Finally, in Figure [T^ we consider the software stack 
for researchers in different countries. Researches in the 
USA have the highest fractions of IDL (49 ± 3%) and 
IRAF (25 ± 2%) users, while Australia has the lowest 
fraction of users of these tools, 32 ± 7% and 12 ± 4%, 
for IDL and IRAF respectively. The UK has the highest 
fraction of SQL users (21+5%); Germany has the highest 
fraction of C/C++ users (48 ± 5%); and Australia has 
the highest fraction of Matlab users (13 + 4%). However, 
these results can be strongly influenced by the research 
areas represented for each country within our sample so 
we caution against drawing far-reaching conclusions. 

We can also compare the USA and non-USA survey re¬ 
spondent since those two samples are comparable in size 


(Figure 
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second and sixth panels). Overall the rank¬ 


ings and fractions of users of different tools are very simi¬ 
lar as can be expected by the global mobility of many as¬ 
tronomers. The only notable exceptions are IDL and R. 
The fraction of IDL users in the USA is 10% larger than 
of non-USA participants. The user base of the statistical 
package R is reversed: 8 + 1% of non-USA researchers 
choose this option vs. only 3 ± 1 of USA researchers. 
Considering the wide-spread use of R in other scientific 
fields, its popularity among astronomers is strikingly low. 


3.5. Python vs. IDL? 

A recent shift in astronomy has been the favored 
choice of interpreted programming language for day-to- 
day analysis work. In the previous section we showed 
that Python has overtaken IDL in popularity. This may 
not have been true three to five years ago, but today 
Python is, by a wide margin, the most popular inter¬ 
preted language in astronomy (at least insofar as this 
survey is representative). Still, there is a significant over¬ 
lap between the users of both languages as many people 
are either transitioning from one to the other or using 
both in their research. In Figure we show a Venn di¬ 
agram of the Python and IDL users. In total 984 (86%) 
of the survey participants use either Python or IDL. Of 
those, 764 use Python and 497 use IDL. Both are chosen 
by 277 or 25% of all survey participants. This indicates 
substantial overlap: 36% of Python users also use IDL 
and 55% of IDL users also use Python. Finally, 158 sur¬ 
vey participants (14% of the full sample) chose neither 
option. 


3.6. Interaetive Visualization Of Software Tools 

To facilitate understanding of this multi-dimensional 
dataset of how use of the various software tools overlap 
with each other, we provide an interactive visualization, 
available within the Authorea version of the paper, by 
downloading the software repository described in Sec¬ 
tion or at this link, In this visualization, the tools 
respondents use are shown as sectors in a radial layout. 
Users of multiple tools are represented as stacked sectors: 
for example, the fraction of users who use only Python 
and IDL are represented as the fraction of the third ring 
labeled “idl” with “python” and “idl” as the lower two 
layers. Hovering over that sector shows the number of 
respondents to the left of the page (for Python and IDL 
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Fig. 10. — Responses to the prompt “Select any of these that you regularly use in your research”, sub-divided by career stage. The 
options listed included: IDL, IRAF, Python, C, Fortran, Perl, Javascript, Julia, Matlab, Java, R, SQL, Shell Scripting, STAN, Figaro, 
Ruby, HML/CSS, Supermongo (labeled “sm”), and Excel or other spreadsheets (labeled “excel”). Respondents could add additional tools 
not listed using an “Other” box. Among the tools in this plot, four items were added by respondents: C++, Mathematica, gnuplot and 
awk. Note that the x axis varies between panels. 
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Fig. II. — Same question as Figurebut sub-divided by sub-field. 


only it is 36.) Clicking on a particular sector expands the 
visualization to show only those who use the correspond¬ 
ing stack of tools, while clicking on the central circle goes 
back to larger segments of the survey. 

4. COMMENTS 

We allowed participants to leave comments at the end 
of the survey. In order to increase the anonymity of com¬ 
ments we detached them from the answers to the other 
questions (aside from career stage). We further removed 
e-mails, names, or other identifying information from the 
content of the comments. If anyone who took the sur¬ 
vey would prefer that their comments not be included. 


we ask them to contact us and we would be happy to 
remove the information from the public dataset. 

We see three recurring topics in the comments. The 
first common comment topic is the switch from IDL to 
Python. Many users comment that they would like to 
or are planning to make the switch from IDL to Python, 
frequently because of licensing issues and costs. We find 
it particularly striking that several senior astronomers 
commented that they are learning Python to be more 
helpful to their students. 

The second common comment topic is the desire for 
more opportunities to improve software development 
skills. Many participants voiced interest in attending 
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United States 

United Kingdom 

Germany 

Australia 

Non-USA 








python 



^ —70+9 % 

^^^^^^^^67+6 % 

75 +10 % 
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-47±2 % 

—46+3% 

-51 +8 % 

—44+5 % 

-42+8% 

—49+3% 

idl 
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^^^■—42+7% 
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^■-32+7% 
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-36+3% 

-41+7% 
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-33+7% 
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■ l2 + l % 

html/css 

-14±1 % 

-15+2% 

—18+4% 
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0+0% 
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0+0% 

0+0% 
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Fig. 12.— Same question as Figurebut sub-divided by country (for the countries with the highest number of respondents). 



Fig. 13.— Venn Diagram of the overlap between respondents 
who reported using Python, IDL, or neither. 

software development classes for astronomers. Several 
suggested that classes in programming and statistics 
should be an integral part of the undergraduate and/or 
graduate curriculum in astronomy. Participants sug¬ 
gested that such training not only aids in better research 
efficiency, but also can make code more readable and 
reusable. 

The third recurring topic in the comments is the lack 
of career opportunities for astronomers who write soft¬ 
ware. Comments suggest that more professional recogni¬ 
tion should be given to those who spend most of their 
time developing important tools for the astronomical 
community and that such efforts should be recognized 
in hiring and explicitly funded. 

We cite some of the representative comments on each 
topic in the Appendix. 


5. COMPARISON TO SSI SURVEY 

This work was inspired by a UK survey led by the 
Software Sustainability Institute. While their findings 
are for the wider scientific community in the UK and 
ours are for the worldwide astronomical community, a 
comparison is still interesting. 

The SSI survey finds that 90% of UK scientists use 
software in their research. Our survey shows that as¬ 
tronomy is in line with other sciences in the use of soft¬ 
ware - 100% of astronomers use it. 90% of astronomers 
write some software (93% of UK astronomers). This is 
much larger than the SSI survey, which finds that only 
56% of researchers, across all disciplines, write some of 
their software. This implies astronomers are much more 
dependent on their own software than other sciences. 

In the SSI survey, 55% of respondents say they have re¬ 
ceived some training in software development, with 40% 
indicating that the training was a formal course and 
15% indicating self-directed study. Our categories are 
not identical, but we find that a similar fraction of as¬ 
tronomers - 57% (53% of UK astronomers) - say that 
they have received some form of training. However, only 
8% say that the training was substantial, while 49% say 
they received a little training. The decision what con¬ 
stitutes a lot and a little was left to the participants. 
Those who chose to expand on their decision indicated 
that a lot corresponded to a formal class while a little 
corresponded to using on-line materials such as Software 
Carpentry and Code Academy. 

Finally, 40% of our survey respondents who predomi¬ 
nantly write their own software have received no train¬ 
ing (42% of those who write some of their own software). 
This fraction is twice as large as the one reported in 
the SSI survey (21%) and may indicate that in astron¬ 
omy there are fewer efforts and opportunities to train re¬ 
searchers in software development. This is all the more 
surprising given that many more astronomers write their 
own code, according to these surveys. 

6. CONCLUSIONS 

Based on the responses summarized in Section we 
come to the following conclusions: 
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• Unsurprisingly, all 1142 survey participants say 
that they do use software in their research (Fig¬ 
ure 2). The unanimous answer to this question 
underscores the importance of understanding how 
astronomers use software (i.e., the purpose of this 
survey). 

• 89% of astronomers across all demographics write 
their own software (Figure]^. However, only 58% 
of those who write software are trained in software. 
Moreover, only 8% self-report as having better than 
“a little” training. From this we conclude that 42% 
of astronomers have no training for a key element 
of their work, and 92 % have at most “a little” 
training. 


• Python is the dominant language among our re¬ 
spondents. Surprisin gly, this is true across all 


career stages (Figure JlOl). While a commonly- 
expressed mindset, reflected in some of the com¬ 
ments, is that graduate students are more likely to 
know the newer languages, it appears that this is 
only mildly true, at least in our survey. 


• Astronomers have a fairly narrow software “stack”, 
with only 10 tools used by more than 10% or re¬ 
spondents. Theorists tend to have a more narrow 


stack relative to other fields (Figure |TT| as do grad¬ 
uate students relative to more senior researchers. 
Independent of career level and field. Python and 
shell scripting are are most popular tools for as¬ 
tronomers. These results show that training efforts 
can have a significant impact even if they only focus 
on a limited number of software tools. We suggest 
that the rankings we produce can help in choosing 
training topics that would be most useful for the 
broadest group of participants. 

We caution that these results are tentative because our 
sampling methodology was not robust. If nothing else, 
we hope this survey will prompt a more formal study 
of software use in astronomy to better understand how 
we should use the limited resources of our community to 
improve software training and software use. 
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APPENDIX 

PARTICIPANT COMMENTS 

The section below shows a representative sampling of free-form comments left by the survey participants. 

On switching from IDL to Python: 

• I recently switched from using IDL as my primary programming language to using Python. (Graduate student) 

• Mainly IDL user who wants to switch to Python as it is more open source. (Graduate student) 

• I learned IDL as an undergrad and continue to mostly code in it as a grad student. However Fve been learning 
Python lately and plan to mostly switch over within the next year or so. (Graduate student) 

• I plan to learn Python but haven’t yet worked with it. (Graduate student) 

• At this stage I see Python as the future and am rapidly moving away from IDL. (Graduate student) 

• I learned IDL as an undergrad (class of 2004) and used it nearly exclusively [...] until about two years ago. Over 
the last two years I’ve been slowly switching to Python [...] (Postdoc) 

• I’ve only recently started working in IDL and Python. I expect to do quite a bit of development in Python from 
now on. (Postdoc) 

• I want to learn Python and R as soon as possible (Postdoc) 

• While I haven’t learned it yet many of my colleagues use Python and I make all of my students learn that (instead 
of e.g. Matlab). (Faculty) 

• I am telling all of my students to learn Python and through that I am also gaining proficiency in Python. This 
is different from what my advisor did. He told me to use the language that he used so that he could help me 
debug it. (Faculty) 

• Moving to Python as IDL needs a license... And I just like the language. (Faculty) 

• Gurrently I code in IDL but I am trying to switch to Python. (Faculty) 

• I plan to switch from IDL to Python over the next 2-3 years. (Faculty) 

• I’m intending to try out Python soon. (Faculty) 
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• Taking a workshop on Python soon I can encourage and help my students learn a language that can be used 
outside of academia as well. (Faculty) 

On the desire for more opportunities to improve software development skills: 

• If there is any software development training program for astronomers I would love to attend. (Graduate student) 

• I think it should be strongly recommended that people going into astronomy should take programming classes. 
In fact I would make it part of the required course work to get a B.S. in Physics or Astronomy. Most astronomers 
I know did not take any formal programming course we learned as we went along. Most of us write our own 
code and many do not use good coding practices making reading or adapting code from other astronomers a lot 
more painful than it should be. (Graduate student) 

• Helps to include courses in computation and statistics in grad curriculum. (Graduate student) 

• Astro students should get more formal training in programming and software design (and it should happen at 
the undergraduate level whenever possible). (Graduate student) 

• I wish I could get more formal training on programming. I have the feeling that Astronomy and Physics depart¬ 
ment usually don’t emphasize the importance of programming until people start doing researches. (Graduate 
student) 

• The coding skills incoming graduate students possess seem to vary wildly but they are often dismal with respect 
to the level required to begin doing serious research right away. In my department there seems to be little 
motivation to rectify this with either: (I) requiring undergraduate GS preparation as a condition for admission; 
or (2) organizing a programming course for beginning graduate students. This situation seems if not unsustainable 
very much non-optimal for the cultivation of strong substantive independent research skills. I imagine many other 
departments are currently facing the same dilemma. (Graduate student) 

• More formal training in software use/development would be wonderful. (Graduate student) 

• Formal training on astronomical software packages as part of my astronomy undergraduate degree would have 
been very helpful for me. (Graduate student) 

• We need to be teaching undergraduates and graduates good object-oriented design skills from day one. Software 
is more important now than it ever was and the “learn as you go” mentality causes a tremendous amount of 
wasted efforts as bad code has to be rewritten all too often with the side effect of having astronomers with career 
skills that aren’t as well developed as they could have been should they choose to leave astronomy. Also we 
should make a concerted effort to rewrite some of our legacy tools (IDL libraries, IRAF, etc.) in a language and 
style that is more easily and cleanly extended and maintained. Incentives to put in the time and money for these 
[initially] low impact projects are hard to come by though. (Graduate student) 

On the lack of career opportunities for people who write software: 

• People who develop software that the community use should be recognized more for their efforts! (Research 
Scientist) 

• It would be great to have more career options for researchers who focus on software development in the astro- 
physical community. Too many good researchers who contribute lots of great software to the community have 
been forced out of the field because of lack of recognition for their work and lack of funding for people other than 
those who publish several science papers a year. (Postdoc) 

• Software development is evidently as important a tool in modern science as mathematics and just as it has 
historically not been deemed wise to outsource all mathematics to professional mathematicians I believe a large 
fraction of scientific software development will have to be accomplished by scientists who are intimately familiar 
with the problem at hand. Perhaps more than is the case for mathematics though the paper metric used for 
hiring scientists often pushes excellent software developers out of science and into industry who are then lost to 
us. (Postdoc) 

• I find that my observational colleagues are often unaware that we as computational scientists need to write 
proposals for supercomputers like they write for telescopes. Also when we write science proposals (e.g., NASA, 
NSF) we have to he about how much time we will spend developing code say only a few months when in reality it 
occupies most of the grant period since code development is frowned upon (except within the new NASA ROSES 
program PDART started in 2014). (Research Scientist) 
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